Experimental design and pre-processing

Data were processed using nf-core rnaseq pipeline revision 3.14.0 using RSEM/STAR for abundance estimation against Homo Sapiens GRCh38 release-86.

sample_id sample_name condition condition_name plate_id
ING7705A1 C_1 C control 1
ING7705A2 DN_1 DN SOX17neg_FOXC1neg 1
ING7705A3 DP_1 DP SOX17pos_FOXC1pos 1
ING7705A4 A_1 A SOX17neg_FOXC1pos 1
ING7705A5 S17_1 S17 SOX17pos_FOXC1neg 1
ING7705A6 C_2 C control 2
ING7705A7 DN_2 DN SOX17neg_FOXC1neg 2
ING7705A8 DP_2 DP SOX17pos_FOXC1pos 2
ING7705A9 A_2 A SOX17neg_FOXC1pos 2
ING7705A10 S17_2 S17 SOX17pos_FOXC1neg 2
ING7705A11 C_3 C control 3
ING7705A12 DN_3 DN SOX17neg_FOXC1neg 3
ING7705A13 DP_3 DP SOX17pos_FOXC1pos 3
ING7705A14 A_3 A SOX17neg_FOXC1pos 3
ING7705A15 S17_3 S17 SOX17pos_FOXC1neg 3

QC

Counts (raw)

Raw read counts for each gene per sample

Counts (normalised)

Normalized counts using size factors.

Expressed genes

Expressed genes are defined as those that have count > 0 in at least 1 sample.


Exploratory analysis

Sample distance

Heatmap of samples distances to assess overall similarities and dissimilarities between samples. Sample similarity is assessed using a Poisson dissimilarity metric, which is fairly robust to differences in library sizes between samples.

PCA

Principal Component Analysis (PCA) based on the variance stabilised abundance estimates. It uses all available genes. The majority of the variance in the dataset is shown in the two PC and associated with a time effect.

Screeplot

Bi-plot condition

Bi-plot replicate

Top 10 Loadings

Eigen-correlation

Heatmaps (most variable)

Heatmaps of the top most variable genes across all samples. The top 500 genes based on a coefficient of variation are displayed (sd/mean). Note that don’t necessarily relate to genes changing significantly between replicate conditions. These visualisations are carried out blind to the experimental design. We’d expect samples from the same experimental group to cluster together.


Differential Gene Expression Analysis - Pairwise Comparisons

\[ design = plate_id + condition \]

Pairwise differential expression between condition groups was assessed using a DESeq2’s Wald test. Significance was assessed based on an independent hypothesis weighting (IHW) value of < 0.05, together with a minimum fold-change of > 0 and a minimum baseMean (i.e. mean abundance across all samples) of > 5. Note that log2 fold changes were shrunk using the “ashr” method prior to filtering.

Thresholds applied:

  • padj < 0.05
  • FC > 0
  • baseMean > 5
comparison up down total
DN_vs_C 2765 2687 5452
DP_vs_C 4644 4419 9063
A_vs_C 4499 4229 8728
S17_vs_C 4599 4216 8815
DP_vs_DN 3573 3529 7102
A_vs_DN 3392 3346 6738
S17_vs_DN 3196 2662 5858
DP_vs_A 1864 1836 3700
DP_vs_S17 1448 1833 3281
S17_vs_A 2572 2113 4685

MA plots

DN_vs_C

DP_vs_C

A_vs_C

S17_vs_C

DP_vs_DN

A_vs_DN

S17_vs_DN

DP_vs_A

DP_vs_S17

S17_vs_A

Gene list overlaps

Assess the overlap of differential genes between the various comparisons. Note that this is agnostic of direction of change.

Differential Heatmaps (top genes)

Heatmaps of the top 30 most up-regulated and top 30 most down-regulated differentially expressed genes per comparison. If fewer than 30 genes are called differential in either direction, then only those are displayed. If there are no differential genes then the heatmap is skipped.

DN_vs_C

DP_vs_C

A_vs_C

S17_vs_C

DP_vs_DN

A_vs_DN

S17_vs_DN

DP_vs_A

DP_vs_S17

S17_vs_A

Differential Heatmaps (all differential genes)

Heatmap derived from a combined view of all differentially expressed genes from all available tests (13680 genes).


sessionInfo

## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] grid      stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] writexl_1.5.2               openxlsx_4.2.8             
##  [3] kableExtra_1.4.0            clusterProfiler_4.14.6     
##  [5] PoiClaClu_1.0.2.1           PCAtools_2.18.0            
##  [7] ggrepel_0.9.6               ashr_2.2-63                
##  [9] IHW_1.34.0                  edgeR_4.4.2                
## [11] limma_3.62.2                DESeq2_1.46.0              
## [13] SummarizedExperiment_1.36.0 MatrixGenerics_1.18.1      
## [15] matrixStats_1.5.0           tximport_1.34.0            
## [17] org.Hs.eg.db_3.20.0         UpSetR_1.4.0               
## [19] GenomicFeatures_1.58.0      AnnotationDbi_1.68.0       
## [21] Biobase_2.66.0              BiocParallel_1.40.0        
## [23] scales_1.3.0                reshape2_1.4.4             
## [25] viridis_0.6.5               viridisLite_0.4.2          
## [27] pheatmap_1.0.12             circlize_0.4.16            
## [29] ComplexHeatmap_2.22.0       RColorBrewer_1.1-3         
## [31] plyranges_1.26.0            GenomicRanges_1.58.0       
## [33] GenomeInfoDb_1.42.3         IRanges_2.40.1             
## [35] S4Vectors_0.44.0            BiocGenerics_0.52.0        
## [37] lubridate_1.9.4             forcats_1.0.0              
## [39] stringr_1.5.1               dplyr_1.1.4                
## [41] purrr_1.0.4                 readr_2.1.5                
## [43] tidyr_1.3.1                 tibble_3.2.1               
## [45] ggplot2_3.5.1               tidyverse_2.0.0            
## [47] here_1.0.1                 
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.4.1             BiocIO_1.16.0            
##   [3] ggplotify_0.1.2           bitops_1.0-9             
##   [5] R.oo_1.27.0               lpsymphony_1.34.0        
##   [7] XML_3.99-0.18             lifecycle_1.0.4          
##   [9] mixsqp_0.3-54             doParallel_1.0.17        
##  [11] rprojroot_2.0.4           vroom_1.6.5              
##  [13] lattice_0.22-6            magrittr_2.0.3           
##  [15] sass_0.4.9                rmarkdown_2.29           
##  [17] jquerylib_0.1.4           yaml_2.3.10              
##  [19] ggtangle_0.0.6            zip_2.3.2                
##  [21] cowplot_1.1.3             DBI_1.2.3                
##  [23] abind_1.4-8               zlibbioc_1.52.0          
##  [25] R.utils_2.13.0            RCurl_1.98-1.17          
##  [27] yulab.utils_0.2.0         GenomeInfoDbData_1.2.13  
##  [29] enrichplot_1.26.6         irlba_2.3.5.1            
##  [31] tidytree_0.4.6            dqrng_0.4.1              
##  [33] svglite_2.1.3             DelayedMatrixStats_1.28.1
##  [35] codetools_0.2-20          DelayedArray_0.32.0      
##  [37] xml2_1.3.8                DOSE_4.0.1               
##  [39] tidyselect_1.2.1          shape_1.4.6.1            
##  [41] aplot_0.2.5               farver_2.1.2             
##  [43] UCSC.utils_1.2.0          ScaledMatrix_1.14.0      
##  [45] GenomicAlignments_1.42.0  jsonlite_2.0.0           
##  [47] GetoptLong_1.0.5          iterators_1.0.14         
##  [49] systemfonts_1.2.1         foreach_1.5.2            
##  [51] tools_4.4.1               treeio_1.30.0            
##  [53] Rcpp_1.0.14               glue_1.8.0               
##  [55] gridExtra_2.3             SparseArray_1.6.2        
##  [57] xfun_0.51                 qvalue_2.38.0            
##  [59] withr_3.0.2               fastmap_1.2.0            
##  [61] digest_0.6.37             rsvd_1.0.5               
##  [63] truncnorm_1.0-9           gridGraphics_0.5-1       
##  [65] timechange_0.3.0          R6_2.6.1                 
##  [67] colorspace_2.1-1          GO.db_3.20.0             
##  [69] RSQLite_2.3.9             R.methodsS3_1.8.2        
##  [71] generics_0.1.3            data.table_1.17.0        
##  [73] rtracklayer_1.66.0        httr_1.4.7               
##  [75] S4Arrays_1.6.0            pkgconfig_2.0.3          
##  [77] gtable_0.3.6              blob_1.2.4               
##  [79] XVector_0.46.0            htmltools_0.5.8.1        
##  [81] fgsea_1.32.4              clue_0.3-66              
##  [83] png_0.1-8                 ggfun_0.1.8              
##  [85] knitr_1.50                rstudioapi_0.17.1        
##  [87] tzdb_0.5.0                rjson_0.2.23             
##  [89] nlme_3.1-168              curl_6.2.2               
##  [91] cachem_1.1.0              GlobalOptions_0.1.2      
##  [93] parallel_4.4.1            restfulr_0.0.15          
##  [95] pillar_1.10.1             vctrs_0.6.5              
##  [97] slam_0.1-55               BiocSingular_1.22.0      
##  [99] beachmat_2.22.0           cluster_2.1.8.1          
## [101] evaluate_1.0.3            invgamma_1.1             
## [103] cli_3.6.4                 locfit_1.5-9.12          
## [105] compiler_4.4.1            Rsamtools_2.22.0         
## [107] rlang_1.1.5               crayon_1.5.3             
## [109] SQUAREM_2021.1            labeling_0.4.3           
## [111] fdrtool_1.2.18            plyr_1.8.9               
## [113] fs_1.6.5                  stringi_1.8.7            
## [115] munsell_0.5.1             Biostrings_2.74.1        
## [117] lazyeval_0.2.2            GOSemSim_2.32.0          
## [119] Matrix_1.7-3              patchwork_1.3.0          
## [121] hms_1.1.3                 sparseMatrixStats_1.18.0 
## [123] bit64_4.6.0-1             KEGGREST_1.46.0          
## [125] statmod_1.5.0             igraph_2.1.4             
## [127] memoise_2.0.1             bslib_0.9.0              
## [129] ggtree_3.14.0             fastmatch_1.1-6          
## [131] bit_4.6.0                 gson_0.1.0               
## [133] ape_5.8-1